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Segmenting a compo ite image via minimum areas. 

5 

The invention relates to a method of segmenting a composite image of pixels 
into a number of fields corresponding to layout elements of the image, the pixels 
1 0 having a value representing the intensity and/or color of a picture element, which 
method comprises finding field separators corresponding to areas of adjacent pixels 
of the image, having a predefined property indicative of a background of the image, 

The invention further relates to a device for segmenting a composite image of 
pixels into a number of fields corresponding to layout elements of the image, the 
15 pixels having a value representing the intensity and/or color of a picture element, 

which device comprises an input unit for inputting an image, and a processing unit for 
finding field separators corresponding to areas of adjacent pixels having a predefined 
property Indicative of a background of the image. 

The invention further relates to a computer program product, 

20 

A method for page segmentation is known from the article "Flexible page 
segmentation using the background" by A* Antonacopoulos and R.T RitChings in 
"Proceedings 12* International Conference on Pattern Recognition, Jerusalem, 
Israel, October 9-12, IEEE-CS Press; 1994, voI2, pp. 339-344". The image is 

25 represented by pixels that have a value representing the Intensity and/or color of a 
picture element. The value is classified as background (usually white) or foreground 
(usually black, being printed space). The white background space that surrounds the 
printed regions on a page is analyzed. The background white space is oovered with 
tiles, i.e. non-overlapping areas of background pixels. 

30 The contour of a foreground field in the image is identified by tracing along 

the white tiles that encircle it r such that the inner borders of the tiles constitute the 
border of a field for further analysis. Aprobtem of the method is that the borders of 
the fields are represented by a complex description which frustrates efficient further 
analysis. 

35 It is an object of the invention to provide a method and device for segmenting 

an image which is more reliable and Jess complicated. 
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According to a first aspect of the invention the object is ach( ved with a 
method as defined in the opening paragraph, characterized the further steps of 
extending the field separators along at least one separation direction to an outer 
border of the image, constructing a tesselation grid of lines corresponding to the 
5 (extended) field separators, constructing a set of basic rectangles, a basic rectangle 
being an area enclosed by lines of the tesselation grid, and constructing the fields by 
connecting basic rectangles that are adjacent and not separated by a field separator. 

According to a second aspect of the invention the object is achieved with a 
device as defined in the opening paragraph, characterized fn that the processing unit 
10 is arranged for extending the field separators along at least one separation direction 
to an outer border of the image, constructing a tesselation grid of lines corresponding 
to the (extended) field separators, constructing a set of basic rectangles, a basic 
rectangle being an area enclosed by lines of the tesselation grid, and constructing 
the fields by connecting basic rectangles that are adjacent and not separated by a 
15 field separator- 
According to a third aspect of the invention the object is achieved with a 
computer program product for performing the method. 

Normally, an image contains field separators having one of at least two 
separation directions, usually horizontal and vertical, that connect andfor cross and 
20 together enclose the lay-out elements, such as text fields. The effect of the present 
method is that a tessellation grid is formed by lines based on extending the field 
separators to the outer borders. Every area enclosed but not sub-divided by the grid 
is called a basic rectangle, and further analysis Is performed on these basic 
rectangles. The advantage of the set of basic rectangles is that fields can be easily 
25 constructed by connecting the basic rectangles. It is to be noted that calculation on 
the level of basic rectangles is computationally substantially more efficient than 
connecting individual pixels or small pixel based objects. 

The invention is based on the following recognition. Segmentation is the 
process of identifying objects in the image at a relevant hierarchical level. For 
30 example in a newspaper page a hierarchy could be a lowest level of pixels, then a 
level of objects of connected pixels (e.g. characters or separators), then text lines, 
then text fields, then columns and finally articles. The inventors have seen that for 
finding fields in a structured image a building block that is just below the required 
level of fields can be constructed by a transformation from the lower level of field 
35 separators to a building block level. The basic rectangles are the building blocks that 
can be efficiently constructed via the tessellation grid The step of connecting basic 
rectangles to an area takes place on the building block level. Finally a transformation 
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from the building block level to the f i Id level is achieved by consolidating basic 
rectangles into f i Ids on the basis of the original connection points of field separators 
or nodes of the image. Hence, the construction of basic rectangles provides a 
convenient way of determining building blocks of fields during segmenting a digital 
5 image which predominantly has polygon fields. 

In an embodiment of the method the step of constructing the set of basic 
rectangles comprises constructing a matrix map representing the tessellation grid by 
a two-dimensional array of elements that each represent either a basic rectangle or a 
line segment of the tessellation grid, an element having a first predefined value for 

1 0 representing a line corresponding to a field separator or a further different value for 
representing a basic rectangle or a line corresponding to an extended field separator. 
The advantage is that the matrix map comprises the basic rectangles and the 
boundaries between the basic rectangles. The matrix map can be processed easily 
because it represents the image on a level of building blocks of fields without 

15 geometric details that would otherwise complicate calculations. 

In an embodiment of the method, nodes are defined at points in the original 
image at positions where the field separators connect and at corresponding positions 
in the tesselation grid, and the step of constructing the fields comprises constructing 
a node matrix corresponding to the tessellation grid and including elements referring 

20 to nodes in me tessellation grid. 

The advantage is that the node matrix comprises references to the nodes in a 
geometric representation. The node matrix allows an easy transformation of the level 
of building blocks of fields, le. basic rectangles, to a representation of the fields by 
nodes. 

25 Further preferred embodiments of the device according to the invention are 

given In the further claims. 

These and other aspects of the invention will be apparent from and elucidated 
further with reference to the embodiments described by way of example in the 
following description and with reference to the accompanying drawings, in which 
30 Figure 1 shows an overview of an exemplary segmentation method, 

Figure 2 shows a part of a sample Japanese newspaper, 
Figure 3 shows the merging of objects along a single direction. 
Figure 4 shows segmentation and two directional merging of objects, 
Figure 5 shows construction of a maximal rectangle from white runs, 
35 Figure 6 shows construction of maximal white rectangles, 

Figure 7 shows cleaning of overlapping maximal white rectangles, 
Figure 8 shows a graph n a newspaper pag , 
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Figure 9 shows two types of Intersection of maximal rectangles, 

Figure 1 0 shows a device for segmenting a picture, 

Figure 1 1 shows a diagram of a method for defining fields on the basis of field 

separators, 

Figure 12 shows a representation of an image, 
Figure 13 shows a tessellation grid on an image, 
Figure 14 shows a matrix map of the tessellation grid, 
Figure 15 shows a single connected area in a matrix, 
Figure 16 shows the contour of a connected area, and 
Figure 17 shows a node matrix. 



• The Figures are diagrammatic and not drawn to scale.* In the Figures, elements 
which correspond to elements already described have the same reference numerals. 

15 Figure 1 shows an overview of an exemplary segmentation method, showing 

three basic steps from known segmentation systems. The input image 1 1 is 
processed in a CCA module 14 that analyses the pixels of the Image using 
Connected Component Analysis. First an original picture that may be e black-and- 
white, grayscale or coloured document, e.g. a newspaper page, is scanned, 

20 preferably In gray scale. Grayscale scanned pictures are halftoned for assigning a 
foreground value (e.g. black) or a background value {e.g. white) to each pixel. The 
CCA module 14 finds foreground elements fn the image by detecting connected 
components (CC) of adjacent pixels having similar properties. An example of the first 
steps in the segmentation process are for instance described in US 5,856,877. The 

25 CCA module produces as output CC Objects 12, that are connected components of 
connected foreground pixels. An LA module 15 receives the CC Objects 12 as input 
and produces Layout Objects 13 by merging and grouping the CC Objects to form 
larger layout objects such as text lines and text blocks. During this phase, heuristics 
are used to group layout elements to form larger layout elements. This is a logical 
30 step in a regular bottom-up procedure. An AF module 1 6 receives the Layout Objects 



13 as input and produces Articles 17 as output by article formation. In this module, 
several layout objects that constitute a larger entity are grouped together! The larger 
entity is assembled using layout rules that apply to the original picture. For example 
in a newspaper page the AF module groups the text blocks and graphical elements 
like pictures to form the separate articles, according to the layout rules of that specific 
newspaper style. Knowledge of the layout type of the image, e.g. Western type 
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magazine, Scientific text or Japanese article layouts, can be used for a rule-based 
approach of article formation resulting in an improved grouping of text blocks. 

According to the Invention additional steps are added to the segmentation as 
described below. The steps relate to segmentation of the image Into fields before 
5 detecting elements within a field, i.e. before forming layout objects that are 

constituted by smaller, separated but interrelated items. Figure 2 shows a sample 
Japanese newspaper. Such newspapers have a specific layout that includes text 
lines in both horizontal reading direction 22 and vertical reading direction 21 . The 
problem for a traditional bottom-up grouping process of detected connected 
1 0 components is that it is not known in which direction the grouping should proceed. 
Hence the segmentation is augmented by an additional step of processing the 
background for detecting the fields in the page. Subsequently the reading direction 
for each field of the Japanese paper is detected before the grouping of characters is 
performed, 

15 in an embodiment of the method, separator elements, e.g. black lines 23 for 

separating columns are detected and converted into background elements. With this 
option, h is possible to separate large elements of black lines 23 containing vertical 
and horizontal lines that are actually connected into different separator elements. In 
Japanese newspapers, lines are very important objects for separating fields in the 
20 layout. It is required that these objects are recognized as lines along separation 
directions. Without this option, these objects would be classified as graphics. Using 
the option the lines can be treated as separator elements in the different orientations 
separately for each separation direction. 

Rgure 3 shows a basic method of merging objects in a single direction. The 
25 Rgure depicts the basic function of the LA module 1 S for finding the layout objects 
oriented in a known direction, such as text blocks for the situation that the reading 
order is known. Connected components 12 are processed in a first, analysis step 31 
by statistical analysis resulting in computed thresholds 32. In a second, classification 
step 33 the COclassification Is corrected resulting in the corrected connected 
30 components 34, which are processed in a third, merging step 36 to join characters to 
text lines, resulting in text lines and other objects 36. In a fourth, text merging step 37 
the text lines are joined to text blocks 38 (and possibly other graphical objects). 
According to the requirements for Japanese new® papers the traditional joining of 
objects must be along at least two reading directions, and the basic method 
35 described above must be improved therefor. 

Figure 4 shows segmentation and two directional joining of objects. New 
additional steps have been added compared to the single directional processing in 
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Figure 3. In a first (pre-) processing step a graph 41 of the image is constructed. The 
construction of the graph by finding field s parators is described below* In the graph, 
fields are detected in field detection step 42 by finding areas that are enclosed by 
edges of the graph. The relevant ar as are classified as fields containing text blocks 
5 47. In the text block 47 (using the connected components 43 or corrected connected 
components 34 that are in the text block area) the reading order 45 is determined in 
Step 44. The reading direction detection is based upon the document spectrum, e.g. 
on the method of O'Gorman and Kasturi described in "Document Image Analysis" 
IEEE Computer Society Press, Los Alamitos, 1 995. Using the fields of the text blocks 
10 47, the contained connected components 43 and the reading order 45 as input, the 
Line Build step 46 joins the characters to lines as required along the direction found. 

Now the constructing of the graph 41 is described, A graph-representation of 
a document is created using the background of a scan- Pixels in the scan are 
classified as background (usually white) or foreground (usually black). Because only 
1 5 large areas of white provide information on fields, small noise objects are removed, 
e.g. by down-sampling the image. The down-sampled image may further be de- 
speckled to remove single foreground (black) pixels. 

The next task is to extract the important white areas. The first step is to detect 
so-called white runs, one pixel high areas of adjacent background pixels. White runs 
20 that are shorter than a predetermined minimal length are excluded from the 
processing. 

Figure 5 shows, as an example, four horizontal runs 51 of white pixels, that 
are adjacent tn vertical direction. Foreground area 53 is assumed to have foreground 
pixels directly surrounding the white runs 51 . A "maximal white rectangle" Is defined 

25 as the largest rectangular area that can be constructed from the adjacent white runs 
51 , hence a rectangular white area that can not be extended without including black 
(foreground) pixels. A maximal white rectangle 52 is shown based on the tour white 
runs 51 having a length as indicated by the vertical dotted lines and a width of 4 
pixels. When a white rectangle can not be extended it has a so-called maximal 

30 separating power Such a rectangle is not a smaller part of a more significant white 
area. Hence the rectangle 52 is the only possible maximal rectangle of width 4. 
Further rectangles can be constructed of width 3 or 2. A further example is shown in 
Figure 6. 

The construction of white rectangles is done separately in different separation 
35 directions, e.g. horizontal and vertical white rectangles. Vertical white rectangles are 
detected by rotating the image, and detecting horizontal white runs for the rotated 
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image. It Is noted that depending on the type of image or application also other 
separation directions may be selected such as diagonal* 

An algorithm for constructing maximal white rectangles is as follows. The 
input of the algorithm consists of all horizontal one pixel high white runs (WR) 
5 detected from a given image- Each white run is represented as a rectangle 
characterized by a set of coordinates ((x^yiMxaye)), where and Xi and yi are 
coordinates of its top left corner and x^y^are the coordinates of its bottom right 
corner. Each white run present in the active ordered object INPUT LIST is tested on 
an extension possibility. The extension possibility te formulated in the condition 

1 0 whether a given WR, labeled by p, can produce a maximal white rectangle (MWR) or 
not If the extension possibility is FALSE, p te already a maximal one, p is deleted 
from the active INPUT LIST and written to the active RESULT LIST. If the extension 
possibility is TRUE, the test for extension is repeated until all MWRs initiated by p 
have been constructed. Then p is deleted from the INPUT LIST and all MWRs 

1 5 obtained from p are written to the active RESULT LIST. When ail white rectangles 
from the INPUT LIST have been processed, the RESULT LIST will contain all MWRs, 
To increase the efficiency of the algorithm, a sort on the y value is applied to the 
INPUT LIST. First, the algorithm is applied for horizontal WRs, i.e. for white runs with 
width larger than height. And after a 90° turn of the image it can be applied to vertical 

20 WRs. 

In an embodiment the algorithm for constructing the maximal rectangles is as 
follows. The rectangle data are stored as a linked list, with, at least, the coordinates 
of the rectangle vertices contained in it. The INPUT and RESULT LISTs are stored 
as a linked list too, with, at least, three elements, such as the number of white 

25 rectangles, and pointers on the first and the last element in the linked list. The 
following steps are executed: Activate INPUT LIST; Initiate RESULT LIST; Initiate 
BUFFER for temporary coordinates of the selected rectangle. Start from the first 
white rectangle, labeled by p t> out of the active ordered INPUT LIST. The next white 
rectangle on the list is labeled by pg. For each white rectangle on the INPUT LIST 

30 examine if p 1 has extension possibility. For the active white rectangle p tl find the finst 
one labeled by Pn$ , on the active ordered INPUT LIST, which satisfies 

ya(pi)=yi(Pnj) 

*l(Pnj) * X*(Pi) 

35 This search results in the set {Pn,,Pn Z p nI }. Only if theset {p ntfPnZF ... fPnl } is not 

empty, p, is said to have extension possibility. 



22-11-2002 u:37 OCE CORPORATE PATENTS + 31773595 497 P 

8 012 22. 11 2002 



- If pi does not have an extension possibility, then pi is a maximal white rectangle. 
Write pt to the RESULT LIST, and remove p1 from the INPUT LIST, and proceed 
with pa. 

- If pt is extendible, then apply the ext nsion procedure to p v Proceed with p& We 
5 note here, that pi can have an extension possibility while being maximal itself. 

The Extension Procedure is as follows. Suppose pi has an extension possibility* then 
there is the set {p rt i,p n 2,..-,Pi^ The extension procedure is applied to each element of 
{Pni.Pne,-...p rt f} consistently. For the white rectangle p, which is extendible with 
rectangle p rtJ , j ~ 1,.. J, construct a new rectangle pi (f4 with coordinates: 
10 xi(pi,n,) max { xttpt), xi(p ni )}, 

X2(p1,nj) - mln f Xsfp!), X 2 (Pnj) } 

yi{pW = yi(piV - - 

y2(Pi,nj) - y 2 (pni) 

Write the coordinates of p t ^, i=1,-»»l to the "coordinates" buffer. Repeat the test on 
15 extension possibility now for p^. If the test is TRUE, p ltnj is maximal. Write p 1?n , to 

the RESULT LIST, otherwise, extend p 1inJ . 

Before applying the extension procedure to p M , we check p, and p nj for 

absorption offset. The test of pi and pnj for absorption effect with p1>rtj is as follows. By 

absorption effect w© mean the situation, in which p t ( p^ or both is {are) completely 
20 contained in p^. In coordinates this means; 

XtCp^i) * *i<Pk). 

xa(pi, rt j) > X2(p k ), where k=1,nj, j=1 

If the condition is TRUE for p,, then p, is absorbed by pi, nj . Remove p t from the 
INPUT LIST. If the condition is TRUE for then p„j is absorbed by p,^ Remove 
25 from the INPUT LIST. 

The algorithm assumes that the rectangle is wider than it is high, and thus the 
rectangles are primarily horizontal. To construct MWRs in vertical direction, the 
original binary image is rotated by 90° clockwise. The algorithm mentioned above is 
repeated for the rotated Image. As a result, all vertical MWRs for the original image 
30 are constructed. 

Figure 6 shows construction of maximal white rectangles. The pixel 
coordinates are displayed along a horizontal x axis and a vertical y axis. Four white 
runs 61 are shown left in the Figure. The white runs (WR) are described as 
rectangles with the coordinates of their upper and bottom comers correspondingly; 
35 WR 1: ((10,1),(50,2)), 
WR* : ((10,2),(50,3)), 
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WRa : ((5.3),(30,4)), 
WFU : ((40,3).(60,4)). 

All maximal white rectangl s from these white runs are constructed The resulting 
five maximal white rectangles (MWR) are shown in the right part of the Figure as 
5 indicated by 62, 63, 64, 65 and 66. The five MWR shown are the complete set of 
MWR for the WR given in the left part of the Figure. A construction algorithm is as 
follows. 

Let the INPUT LIST contain the four white runs 61 . The first element from the 
INPUT LIST is WR,((1 0,1 ).(50,2)). Label WR, as p,. Examine p, on the extension 

1 0 possibility as described above. The first candidate for extension is 

WR 2 ((10,2),(50,3)). Label WR 2 as fa. Extend p 4 with p M according to the formula for 
extension above, which gives a new rectangle p,, n1 with the coordinates 
((10,1),(60,3)). Test p, and p M on the absorption eflfecf with p ljft i. As follows from 
absorption test both p< and p„i are absorbed by p 1>n1 . Therefore, delete p, and fa 

1 5 from the INPUT LIST. Proceed with p 1<n1 . Test p njl1 on the extension possibility, which 
gives the first candidate WRg ((5,3),(30,4)). Label WR 3 as fa Extend p 1ifl1 with p» 
according to the extension formula. As a result, we obtain a new rectangle p<i,ni).ti 
with the coordinates ((10,1),(30,4)). Test p 1lM with p* on the absorption effect with 
P(i^i>.ti. The test fails. 

20 Repeat the test on extension possibility for p ( , ^ The test falls, i.e. po,,,,.,, has no 
extension possibility. It means that p (1 . n „.„ is maximal, Write p (1 . n1)>t1 with the 
coordinates ((10,1), (30,4)) to the RESULT LIST. 

Proceed again with p^ and test it on extension possibility. The second candidate 
WR* ((40,3),(60,4)) is found. Label WR* as pa. Extend p 1in1 with fa according to the 
25 extension formula. As a result, we obtain a new rectangle Pcm^ with the 
coordinates ((40,1),(50,4)), 

Test pi, n1 with p a on the absorption effect with p< 1 . n1 ) ja . The test fails, i.e. no 
absorption. Repeat test on extension possibility for p (1>n1)J2 and the test fails, i.e. 
P(i.ni).e has no extension possibility. It means thatp (1 . n1)i , a is maximal. Write 
30 with the coordinates ((40,1 ),(50,4)) to the RESULT LIST. 

Test p,, nl again on extension possibility. The test fails and p 1)(l1 is maximal. Write p 1i0l 
with the coordinates ((10,1),(50,3)) to the RESULT LIST. 

Return to the INPUT LIST. The INPUT LIST on this stage contains two write runs, i.e. 
WRa : ((5,3),(30,4)), WR, : ((40,3),(60,4)). Start from WR* and label it as p*. Repeat 
35 teston extension possibility for p z . The test fails, pj, is maximal. Write p 2 withthe 
coordinates ((5,3),(30,4)) to the RESULT LIST. Remove Ps from the INPUT LIST, 
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Proceed with WR 4 and label it as p 3 . Test on extension possibility tor ft, gives us that 
p3 Is maximal. Write p 3 with the coordinates {(40,3),(60,4)) to the RESULT LIST. 
Remove pa from the INPUT LIST. Finally, the RESULT LIST contains five maximal 
white rectangles, i.e. MWR, : ((lO.tJ.tSO.S)) indicated in Figure 6 as 64, MWR 3 : 
5 ((10,1),(30,4)) indicated as 62, MWR 3 : ((40,1),(50 f 4)) indicated as 63, and 
MWR, : ((5.3),(30,4)) as 65, MWR 5 : ((40,3),(60,4)) as 66. 

Figure 7 shows a next step in the method according to the invention, namely 
a cleaning step of overlapping maximal white rectangles. In the cleaning step, plural 
overlapping maximal white rectangles are consolidated into a single so-called 
10 "Informative Maximal Rectangle" (IWR) that combines the most relevant properties of 
the original maximal white rectangles, as discussed below in detail. 

The cleaning may further Include steps like checking on size and spatial • 
relation. The upper part of Figure 7 shows, as an example, two maximal white 
rectangles MWR1 and MWR2. The pair is consolidated into a single Informative 
15 White Rectangle IWR in the cleaning step as shown in the lower part of the Figure. 
The process of detecting overlap and consolidating is repeated until no relevant pairs 
can be formed anymore. A criterion for forming pairs may be the size of the overlap 



area. 



Further cleaning steps may include removing thin or short rectangles or 
20 rectangles that have an aspect ratio below a certain predefined value. The criteria for 
removing are based on the type of image, e.g. a width below a predefined number of 
pixels indicates a separator of text lines and is not relevant for separating fields, and 
a length below a certain value is not relevant in view of the expected sizes of the 
fieJds. 



25 



30 



An algorithm for the cleaning step is as follows. The start of the cleaning 
procedure is the whole set of MWRs constructed as described above with reference 
to Figures 5 and 6. The cleaning procedure is applied to discard non-informative 
MWRs. For this reason a measure of non-informativeness is defined. For example a 
long MWR is more informative than a short one. A low aspect ratio indicates a more 
or less square rectangle that is less informative. Further, extremely thin rectangles 
which for instance separate two text lines, must be excluded. First, all MWRs are ' 
dassffied as„being horizontal, vertical or square by computing the ration between 
their heights and widths. Square MWRs are deleted because of their non- 
informativeness. For the remaining horizontal and vertical MWRs the cleaning 
35 technique is applied which consists of three steps: 

- Each MWR with a length or width below a given value is deleted. 
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• Each MWR with aspect ratio (AR), defined as the ratio of the longer side length 
divided by the shorter side length, below a given value is deleted. 
- For each pair of overlapping horizontal (or vertical) MWR t ((Xi 1 yi) l (x 2 ,y 2 )) and 
horizontal (or vertical) MWR 2 ((ai v bi) l (a 2 »b«)) l an informative white rectangle IWR is 
5 constructed with the following coordinates: 

(a) Horizontal overlap: 
Xi = min {xi.aj, 

y 1 = max{y lT b l } > 
x 2 =max{X2, a^h 
10 ya^min (y 2 , b 2 }. 

(b) Vertical overlap;} 
x' 1 = max{x 1l ai}, 
ySsminly^bt}, 
x , 2amin{x Zl a 2 } f 

15 ?2=rnax{y 2 ,b 2 l 

This process is repeated for all pairs of overlapping MWRs. The set of MWRanow 
comprises Informative White Rectangles IWRs. These IWRs form the starting point 
for an algorithm for segmentation of the image into fields corresponding to the lay-out 
elements. The IWRs are potential field separators and are therefore called 
20 "separating elements". Using the IWRs, the algorithm constructs a graph for further 
processing into a geographical description of the Image. 

Figure 8 shows such a graph on a newspaper page. The picture shows a 
down-sampled digital image 80 of a newspaper page. The original text is visible in 
black in a down-sampled version corresponding to Figure 2. The informative 
25 rectangles IWR constituting separating elements are shown in gray. For the 
construction of the graph, intersections of separating elements constituted by 
horizontal and vertical white IWRs are determined. The intersection point of two 
IWRs is indicated by a small black square representing a vertex or vertex 81 in the 
graph. Edges 82 that represent lines that separate the fields in the page are 
30 constructed by connecting pairs of vertices 81 via "field separators''. The Sdges 82 of 
the graph are shown in white. The distance between the two vertices of ah edge, i.e. 
the length, is assigned as weight to the edge for further processing. In an alternative 
embodiment a different parameter is used for assigning the weight, e.g. the colour of 
the pixels. An algorithm for constructing the graph is as follows. 
35 At the beginning, the following notation and definitions for IWRs is given. Let 

R - fa p..-,r m } be the non-empty and finite set of all IWRs obtained from a given image 
I, where each IWR is specified by its x- and y- coordinates of top left comer and 
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bottom right comer ((*«. y, w ), (Xa w ,y 2 w )), t^A..., m respectively. Each 
rectangle r t is classified as horizontal, vertical or square based on the ratio of its 

h ight and width. H = {h, h,}, V = { v,,...,vx} , and S= {s,,...,s d } denote the 

subsets of horizontal, vertical and square IWRs, respectively, such that 
5 H uVuS = R and m = 1 + k + d, and 

HnV = 0, VnS=0, HnS = 0 
where it is assumed that 
H *0, V*0. 

Further the contents of S are ignored and only the subsets H and V are used: This is 
1 0 based on the consideration, that in most cases white spaces that form the border of 
text or non-text blocks are oblong vertical or horizontal areas. Let h be part of H with 
. coordinates ((x,.y,),(X2,ya)) and v in V with coordinates ((a 1 ,b 1 ),(a 2 ,b2)) Then h and v 
have overlap if 



15 



25 




20 By the intersection point of h and v in case of overlap, we take the unique point P 
defined by the coordinates: 



{ 



xp = % (max{x, ,a 1 } + min{x a ,a a }), 
yp = & ( max { y, , } + min { y a , b« } ) 



For IWRs only two from all possible types of overlap occur, namely overiap resulting 
in a rectangle and overlap resulting in a point. Line overlap cannot occur, because 
this would be in contradiction with the concept of the MWRs. 

Figure 9 shows two types of intersection of maximal rectangles. For 
30 constructing the graph the Intersection points of vertical and horizontal informative 
maximal rectangles are determined to find the position of vertices of the graph, i.e. to 
determine the exact coordinates of thevertices: The left part of the Figure shows a 
first type of intersection of vertical IWR v and a horizontal IWR h. which results in a 
rectangular area 86 with a center of intersection point P. The right part of the Figure 
3S shows a second type of intersection of a vertical IWR v and a horizontal IWR h, that 
results in a single intersection point 89 with a center of intersection at P\ 
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An algorithm for constructing the graph based on the Intersection points is as 
follows. 

P ■ {pi* -»Pn} denotes the set of all Intersection points of vertical IWFte and 
horizontal IWRs where each p in P is specified by its x- and y- coordinates (Xp, y p ), 
5 where p^1 ,...,N. Let the set P be found, and Qs(X,A) an undirected graph having 
correspondence to P. The graph G=(X,A) consists of a finite number of vertices X 
which are directly related to the intersection points and a finite number of edges A 
which describe the relation between Intersection points. Mathematically this is 
expressed as 

10 

G(P) = (X(P),A(PxP)), 
P:HxV-»{xp,y P }, 

where 

X=M1. — . ,N}and 
15 A=:({1, N}x{1, N})with 

A ( i. j ) = f if i and j are not 4-ohain connected, 
\ d i if if I and j are 4-chain connected 

20 where d n indicates the Euclidean distance between points i and j, and where 4-chain 
connected means that the vertices of a rectangular block are connected In four 
possible directions of movement In the above two points i and j are 4-chain 
connected If they can be reached by walking around with the aid of 4~connected 
chain codes with min dy in one direction. 
25 The graph as constructed may now be further processed for classifying the 

areas within the graph as text blocks or a similar classification depending on the type 
of picture- In an embodiment the graph is augmented by including foreground 
separators, e.g. black lines or patterned lines such as dashed/dotted lines, in the 
analysis. Also, edges of photos or graphic objects which are detected can be 
30 included in the analysis. 

The present segmenting method may also include a step of removing 
foreground separators. First, foreground separators are recognized and 
reconstructed as single objects. The components that constitute a patterned line are 
connected by analyzing element heuristics, spatial relation heuristics and line 
35 heuristics, i.e. building a combined element in a direction and detecting if it classifies 
as a line. A further method for reconstructing a solid line from a patterned line Is 
down-sampling and/or using the Run Length Smoothing Algorithm (RLSA) as 
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described by K.Y. Wong, R.G. Casey, F.M. Wahl in "Document analysis system" IBM 
J. Res. Dev 26 (1982) 847-666. After detecting the foreground separators they are 
replaced by background pixels. The effect is that larger maximal white rectangles can 
b constructed, or supporting any other suitable method using the background pixel 
5 property for finding background separators. 

Figure 1 1 shows a diagram of a method of defining fields on the basis of field 
separators. 

Basically, the task of this method is to define fields in an image, wherein fields 
are defined as areas containing interrelated foreground elements, e.g, text blooks in 
10 a newspaper image. The fields in an image are separated by field separators that are 
understood to be geometrical lines having a direction and zero thickness. Field 
separators correspond to areas of connected background pixels, that have an oblong 
shape in a separation direction, usually horizontal or vertical. The crossing points of 
the field separators are called nodes. According to the method, first the field 
1 5 separators in the image are detected, and then the fields am determined on the basis 
of an analysis of the field separators. 

in a SEPAR step 95 the image Is analyzed to derive field separators. The field 
separators are preferably based on the analysis using maximal white rectangles as 
described above. The analysis using maximal white rectangles delivers a graph 
20 having edges and vertices where the edges connect. For the method of the present 
invention, the field separators and nodes correspond to the edges and the vertices of 
the graph, respectively. Also, other suitable methods may be used for determining 
field separators. It is noted that the process of deriving separators may already have 
been completed, earlier, or the image is a representation of a structure on a higher 
25 level that already shows separators. 

The field separators thus found may slightly deviate from the basic horizontal 
and vertical directions, e.g. as a consequence of scan misalignments, and such could 
lead to errors in the further processing steps. Therefore, a "snap to grief step, forcing 
small deviations of the X- or Y^coordinate of a field separator to zero, may be added 
30 to the process at this point. 

In a TESS step 96 a transformation to a building block level fe performed. In 
this step the image is divided into basic rectangles that form the building blocks of 
fields In the image, by extending the field separators until they meet the outer border 
of the image. In this way a so-called tesselation grid is formed, and the areas 
35 enclosed by the (extended) field separators are defined as basic rectangles. 

The generation of the tessellation grid is explained in detail below with 
reference to Figures 12 and 13. 
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Basically, the method now connects the basic rectangles that are not 
separated by a field separator into fields. A particularly efficient way to perform this 
process includes the following steps. 

In a MATRIX step 97 a new representation of the tesselated image is made 
5 on the form of a matrix map. In the matrix map, the basic rectangles and the 

tesselation grid elements are represented by the matrix elements. This step is further 
described below with reference to Figure 14. 

In a CONN step 98 the basic rectangles are connected to form areas of 
connected basic rectangles. Basic rectangles are considered connected if they are 
1 0 separated by an extended part of a line, and not connected if separated by a line part 
associated to a field separator. A connected component algorithm is used in this step 
as described below with reference to Figure 14* 

The sets of connected basic rectangles as determined in this step now correspond to 

the fields of the original image. 
15 In a NODE step 99 the original nodes that border the fields found in the 

CONN step are retrieved for defining the positions of the fields in the original image. 
Finally in FIELD step 100 the original nodes retrieved in the previous step are 

combined to a data structure defining a field for each area of connected basic 

rectangles. This amounts to a transformation from the matrix representation back to 
20 the pixel domain. This step is further described below with reference to Figures 15 - 

17. 

The TESS step of the algorithm will now be described in greater detail. 
Figure 12 shows a representation of an image. The Image is represented by 

lines associated to field separators 1 10 that enclose the fields 1 09. Field separators 
25 110 represent background, usually white in a newspaper, and are shown as black 

lines. The foreground areas between me field separators, such as field 109 in this 

example, are to be defined as fields. The task to be performed Is identifying the fields 

In the image. 

Figure 13 shows a tessellation grid on an image, based on the input image of 
30 Figure 12. For generating the tessellation grid, all field separators (uninterrupted lines 
1 10 in Figure 1 3) have been extended up to the borders of the image. As a result, the 
image Is subdivided by vertical lines in 4 X-segments AX t to AX* and by horizontal 
lines 6 Y-segments AY t to AY e . Extensions of field separators 1 10 are indicated by 
dashed lines 1 1 1 . For example, nodes 2 and 6 are actual nodes cf a field separator 
35 and the extension causes a virtual node 1 16 in between nodes 2 and 6. Two basic 
rectangles are formed in the area directly to the right of the line between nodes 2 and 
6. Every rectangle in the tessellation grid formed by the lines based on extending the 
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field separators is a so-called basic rectangle. For example the basic rectangle 1 13 is 
part of a connected area as indicated by the shaded area, which is constituted by 
every basic rectangle not separated from baste rectangle 1 13 by a field separator. 
The area of connected basic rectangles can be constructed easily as is described 
5 below with reference to Figure 1 4. 

ft is noted that the approach may be extended to areas, which are not 
substantially rectangular structures. Piecewise linearization and/or elastically 
deformation of the planar graph can be applied for processing images containing 
"curved bordered" areas. 
10 in the MATRIX step of the basic algorithm the tesselated image as shown In 

figure 13 is converted into a matrix representation, in which every basic rectangle 
and every line segment is associated with a.matrix element. The .tesselated Image 
spans 4 basic rectangles and 5 vertical lines associated with field separators when 
traversed in horizontal direction and accordingly, the matrix representation has 9 
1 5 columns. The tesselated image spans 6 basic rectangles and 7 horizontal lines when 
traversed in vertical direction and accordingly, the matrix representation has 13 rows. 

Initially, every matrix element is given the value 1. Than, all matrix elements 
are systematically checked for being associated to a field separator of the original 
image and, if so, are changed in value to 0. Thus, a foreground element is 
20 represented by a 1 and background element by a 0. 

Alternatively, matrix elements may be changed to 0 by checking the list of 
field separators, which would normally result in less operations. 

Figure 14 shows the resulting matrix map 120 of the image in Figure 13. E.g. p 
the basic rectangle 1 1 3 is now reduced to a single element 1 23 of the matrix and 
25 extended line segment 1 1 1 Is now element 121 of the matrix. Nodes 2 and 6 are 
represented by elements 124 and 125. Also shown is the matrix element 
corresponding to virtual node 1 1 6. This element has the value 1 , because it is part of 
a field separator. It is to be noted that the geographical shape Is not preserved, 
because the length of the lines between nodes are not taken into account. The 
30 relation between the original nodes in the representation of the image and the 

tessellation grid is stored separately as described below with reference to Figure 17. 

The area 109 (Figure 12) is shown in Fig. 14 as a shaded area-122 of - 
elements all being 1. 

In the CONN step of the algorithm, the matrix map as generated is 
35 subsequently subjected to a connected component process for finding sets of 
connected elements having a value of 1 in the matrix. Connected component 
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algorithms are widely known in the literature and will therefore not be described here 
further. 

The NODE step of the algorithm is now described in more detail. As an 
example, Figure 15 shows a single connected area 130 in the matrix of Fig. 14. The 
5 matrix shown is based on the tessellation grid as described above, but only 

connected area 130 as detected by the connected components process is Indicated 
by a shaded area. The constituting elements of the connected area have a value of 1 
and are surrounded by elements of a value of zero. In the following steps a field is 
defined based on a contour around the connected area, 
10 Figure 16 shows the contour 140 of a connected area. The contour is 

Indicated by a shaded area of values 1 around an area having values 0 
corresponding to connected area 130. For finding the contour first the area 130 Is 
dilated by one pixel, and then the original area is subtracted. 

Rgure 17 shows a node matrix. The matrix has the same dimension as the 
15 matrix map. The value of the elements is either a node number (between 0 and 19) 
or empty. The node numbers refer to the nodes in the original image as shown in 
Figure 12. The contour 140 of connected area 130 derived above, is projected on the 
node matrix and shown by a shaded area 141. 

The node matrix is constructed as follows. Initially, the value of the elements 
20 is set to •empty'. Then actual nodes of field separators are entered into the matrix, 
e.g. on the basis of the vertex list of the graph. 

The task is to extract all nodes belonging to the contour 1 40 of the area 1 30. 
The nodes present in the contour are retrieved by tracing the contour and denoting 
the nodes therein. 

25 After tracing the contour the nodes are coupled to the original image 

representation In the FIELD step of the algorithm. If necessary an inverse of the 
"snap-to-grid" process is applied, and the node numbers are coupled again with the 
original set of nodes. Finally, if required, the nodes and/or edges of a field are 
ordered, e.g. in clock wise direction. The ordering may be required for area 

30 computation or displaying. 

The node extraction and field determination must of course be performed for 
all fields in the image. 

It is noted that areas may enclose each other, which results in disjunct 
polygons, e.g. a text encirclement. In order to be able to operate on areas, bounded 
35 by multiple disjunct polygons, a known technique connecting those polygons is used. 
The two contours of the polygons are oonnected by a so-called "zero area bridge", 
actually 2 line segments, one entering and one leaving the inner contour. 
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Figure 10 shows a device wherein the method for segm nting a picture in 
accordance with the present invention is implemented. The device has an input unit 
91 for entering a digital image. The input unit may comprise a scanning unit for 
scanning an image from paper such as an electro-optical scanner, or a digital 
5 communication unit for receiving the image from a network like internet, or a 

playback unit for retrieving digital information from a record carrier like an optical disc 
drive. The Input unit 91 is coupled to a processing unit 94, which cooperates with a 
memory unit 92. The processing unit may comprise a general purpose computer 
central processing unit (CPU) and supporting circuits and operates using software for 
10 performing the segmentation as described above. In particular, the software includes 
modules (not separately shown in the Figure) for constructing the tesselation grid by 
extending the field separators to the outer borders of the Image, constructing the 
basic rectangles and constructing the fields by connecting adjacent basic rectangles 
that are not separated by a field separator. In addition, the software includes modules 
1 5 for constructing a matrix map representing the tessellation grid and constructing a 
node matrix related to the nodes in the tessellation grid. 

The processing unit may further include a user interface 95 provided with 
control means such as a keyboard, a mouse device or operator buttons. The output 
of the processing unit is coupled to a display unit 93. in an embodiment the display 
unit is a printing unit for outputting a processed image on paper, or a recording unit 
for storing the segmented image on a record carrier like a magnetic tape or optical 
disk. 

Although the invention has been mainly explained by embodiments a 
newspaper page as the digital image to be segmented, the invention Is also suitable 
for any digital representation comprising fields on a background, such as electrical 
circuits in layout images for «c design or streets and buildings on city maps Further It 
is noted that the graph as starting point for executing the segmenting by shortest 
cycles may be constructed differently than the graph described above based on the 
MWR system. For example a graph may be constructed using tiles as described in 
the article by Antonacopoulos mentioned above. Further the weight assigned to an 
edge In the graph is not necessarily the distance, it must be selected to correspond 
to a contribution to the shortest cycle, for example the weight may be the surfaceof 
the We. It is noted, that in this document the use of the verb 'comprise' and its 
conjugations does not exclude the presence of other elements or steps than those 
•rated and the word 'a' or 'an' preceding an element does not exCude the presence of 
a plural,ty of such elements, that any reference signs do not limit the scope of the 
claims, that the invention and every unit or means mentioned may be implemented 
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by suitable hardwar and/or software and that several 'means' or 'units 1 may be 
represented by the same item. Further, the scope of the invention is not limited to the 
embodim nts, and the invention li s in each and every novel feature or combination 
of features described above. 
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1 . Method of segmenting a composite image of pixels into a number of f ielcte 
corresponding to layout elements of the image, the pixels having a value 
representing the intensity and/or color of a picture element, which method comprises 

- finding field separators corresponding to areas of adjacent pixels of the image, 
having a predefined property indicative of a background of the image, 
and is characterized by the further steps of 

- extending the field separators along at least one separation direction to an outer 
border of the image, 

- constructing a tesselation grid of lines corresponding to the {extended) field 
separators, 

- constructing a set of basic rectangles, a basic rectangle being an area enclosed by 
lines of the tesselation grid, and 

- constructing the fields by connecting basic rectangles that are adjacent and not 
separated by a field separator. 

2. Method as claimed in claim 1, wherein 

the step of constructing the set of basic rectangles comprises constructing a matrix 
map representing the tessellation grid by a two-dimensional array of elements that 
each represent either a basic rectangle or a line segment of the tessellation grid an 
element having a first predefined value for representing a line corresponding to a 
field separator or a further, different, value for representing a basic rectangle or a line 
corresponding to an extended field separator. 

3. Method as claimed in claim 2, wherein 

the step of constructing the fields comprises connecting elements in the matrix map 
that have said further, different, value. 

4. Method as claimed In cairn 1.2or3, wherein nodes are defined at points 
where the field separators connect, and wherein 

the step of constructing the fields comprises constructing a node matrix 
corresponding to the testation grid and Including elements referring to nodes in the 
tessellation grid. 
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- extending the field separators along at least one separation direction to an outer 
border of the image, 

- constructing a tesselation grid of lines corresponding to the (extended) field 
separators, 

5 - constructing a set of basic rectangles, a basic rectangle being an area enclosed by 
lines of the tesselation grid, and 

- constructing the fields by connecting basic rectangles that are adjacent and not 
separated by a field separator. 

10 12. Device as claimed in claim 1 1 , wherein a processing unit (94) is arranged for 

- constructing a matrix map representing the tessellation grid by a two-dimensional 
array of elements that each represent either a basic rectangle or a iinesegment of 
the tessellation grid, an element having a first predefined value for representing a.llne 
corresponding to a field separator or a further different value for representing a basic 

1 5 rectangle or a line corresponding to an extended field separator. 

13. Device as claimed in claim 1 1 or 12, wherein a processing unit (94) is 
arranged for 

- constructing a node matrix corresponding to the tessellation grid and including 
20 elements referring to nodes in the tessellation grid. 

1 4, Device as claimed in claim 1 1 , 12 or 13, wherein the device comprises a 
display unit (93) for displaying fields of the image after segmenting. 
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5. Method as claimed in claim 4, wherein 

- the step of constructing the fields comprises constructing a contour for each area of 
connected elements in the matrix map and finding the nodes defining th field by 
5 projecting the contour on the node matrix. 

6. Method as claimed in claim 5, wherein said contour is constructed by dilating 
the area and subtracting the area of the dilated area. 

10 7. Method as claimed in any of the claims 1 to 6, wherein the segmenting 
comprises 

- constructing a graph, the graph having edges corresponding to areas of adjacent 
pixels having a predefined property indicative of a background of the image and 
vertices where the edges connect, and associating field separators to the edges of 
15 the graph, and 

-forming said tessellation grid by extending the field separators to an outer border of • 
the image. 

8. Method as claimed in claim 7, wherein the constructing of the graph 1 
20 comprises cleaning the graph by removing vertices that are connected to less then 

two edges and/or removing any edges that connect to such vertices. 

9. Method as claimed in any of tha preceding claims, wherein the method 
comprises snapping the lines in the tessellation grid to two orthogonal separation 

25 directions. 

10. Computer program product for segmenting an image of pixels into a number ■ 
of fields, which program is operative to cause a processor to perform the method as ,' 
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claimed in any of the claims 1 to 9. 

11. Device for segmenting a composite image of pixels into a number of fields 
corresponding to layout elements of the image, the pixels having a value 
representing the intensity and/or color of a picture element, which device comprises 

- an input unit (91) for inputting an image, and 

- a processing unit (94) for finding fieid separators corresponding to areas of adjacent 
pixels having a predefined property indicative of a background of the image, 
characterized in that the processing unit (94) is arranged for 
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ABSTRACT 



5 A method is described for segmenting an image of pixels into a number of fields. Rist 
the method finds field separators (110) using the background of the image, in 
particular white areas between, for instance, text fields. The segmenting includes 
constructing: 

- a tessellation grid of lines and nodes fonned by extending the field separators along 
1 0 at least one separation direction (in practice: horizontal and vertical) to an outer 

border of the image, and 

- a set of basic rectangles enclosed by the lines of the tesselation grid of field 
separators (110) and extended field separators (1 1 1). 

Finally the fields are constructed by consolidating basic rectangles that are adjacent 
15 without being separated by a field separator. 
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